Skip to content
This repository has been archived by the owner on Jun 12, 2023. It is now read-only.

ara/deterministic-timing-rework #1790

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

anthonyra
Copy link
Contributor

@anthonyra anthonyra commented Aug 5, 2022

I needed to add a .devcontainer since I use an M1 Mac

new and improved

better logs

be more aggressive and check block timer on set

better logs

moar logs

sync local to remote

Pick most severe AvgBlockTime

config fixes

read_timer instead

tweaks to logic

whoops

better time selection

for workcation

attempt at making different block history

ugh
@anthonyra
Copy link
Contributor Author

anthonyra commented Aug 15, 2022

Some added context, the attempt here is to include a lazy vote in regards to the next block time for CG members. It utilizes BBA metadata to do this. In doing so validators are always voting for the "next" block.

Things to consider

  • Since the next block is being voted on, it defaults to old calculations for block time and upgrades to the group time if applicable
  • I also found that the old calculation lead to some very "annoying" behavior depending on block history. As the block history of the validator increases it becomes significantly less sensitive to current block time concerns. Which results in a slower reaction time to the "current" state of the blockchain. An example, if the validator has 50k blocks of history and the average is 60.2 seconds when a block is added that's 80 seconds the response is "weak" (60.200396 new average). In comparison, if it had a block history of only 300 and 60.2 seconds average and that 80 second block was added it would result in a much "faster" (60.26578073089701 new average) block time.

In response to the second bullet, I "hacked" together an upper and lower bound for block history. The upper bound is greater then snapshot size but less then 50k. The lower bound is snapshot size. It then performs a similar calculation as before but chooses the most aggressive "slow" average for block times. It also tries to be the most aggressive in the beginning of an epoch then tappers down closer to target block time. This is to ensure that all nodes are as close to the same level of "work" when approaching those thicc reward blocks.

TODOS:

  • Test with a group that has different block time histories

%% if it's large, jam on the gas
catchup_time(_) ->
10.
%% try and catch up within 10 blocks, max 10 seconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest leaving the catchup_time formulas alone for now, until the block time voting mechanism is widely deployed. That way, while there is a mix of those using and not using the new voting mechanism, they will all at least be using the same calculation for what the next time should be. Then, as a post adoption optimization, I think the proportional catch up I suggested would actually be better as it would provide more consistency rather than these sporadic 50s blocks. In my opinion, ten 59 second blocks are better than one 50 second blocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some forced halts and restarts. The proportional and original catch-up calculations were far more aggressive then what I proposed. I guess I forgot to mention, that it still tries to reach 60 seconds over the course of an epoch. However, the quicker block times occur early in the epoch compared to later. It also leads to less bouncing due to the "hunting mechanisms" sensitivity.

-include_lib("eunit/include/eunit.hrl").

%% Confirm changes to make catchup_time proportional
catchup_time_test() ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these unit test where specific to my changes in #1557 . Remove if not using that logic or they will not work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yea, haven't gotten to "tests" yet. But figured I'd post this to get some more eyes on it.

end;
false ->
{0, undefined}
%% mimic snapshot_take functionality for block range window
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of looking at the very short ("snapshot_take") block time average here? I fear this is going to add a lot of volatility to block times (in addition to adding complexity to this code). I believe you are trying to catch up quickly. However, to what benefit? I think it would be preferable to catch up over a longer period of time to have less volatility in times from block to block. There is a high chance that this period includes multiple rewards blocks which the algorithm will try to quick compensate for which exceptionally quick in my opinion is unnecessary.

@@ -0,0 +1,22 @@
// For format details, see https://aka.ms/devcontainer.json. For config options, see the README at:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't you add this file to .gitignore or something? or not add it to your commit?

@PaulVMo
Copy link
Contributor

PaulVMo commented Aug 18, 2022

Good stuff Anthony. Thanks for working on this. This issue has plagued the community for too long and they (myself included) will be happy to see it abolished. Gone will be the days of so meticulously worrying over how much block history is on a validator.

I did add some review comments. I am not sold on the changes to make reactions "faster". We are talking about timeframes of 30 days / ~50K blocks to make up for lost blocks / emissions, I do not think we need to make that all up so quickly. Especially with rewards being several minutes. I think we will end up seeing a long reward block, followed by five to ten 50 second blocks and then 20 or so 60s blocks which then repeats. That puts undo pressure on validators to catch up so quickly all the time. I think we'd be better off with the long window and letting validators catch up over the whole epoch (or multiple epoch) say with a long reward block followed by 30 block at at 57 or 58 seconds rather than slamming on the gas for a few blocks and then backing off.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants